Overview

Dataset statistics

Number of variables12
Number of observations1106
Missing cells0
Missing cells (%)0.0%
Duplicate rows110
Duplicate rows (%)9.9%
Total size in memory103.8 KiB
Average record size in memory96.1 B

Variable types

NUM11
BOOL1

Warnings

Dataset has 110 (9.9%) duplicate rows Duplicates

Reproduction

Analysis started2020-09-19 20:18:05.634047
Analysis finished2020-09-19 20:18:43.234887
Duration37.6 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

freight_value
Real number (ℝ≥0)

Distinct468
Distinct (%)42.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.16462929
Minimum0.03
Maximum171.88
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum0.03
5-th percentile7.78
Q112.29
median15.1
Q317.52
95-th percentile36.5675
Maximum171.88
Range171.85
Interquartile range (IQR)5.23

Descriptive statistics

Standard deviation13.33676975
Coefficient of variation (CV)0.7769914232
Kurtosis35.71846193
Mean17.16462929
Median Absolute Deviation (MAD)2.55
Skewness5.110310986
Sum18984.08
Variance177.8694272
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
7.78363.3%
 
15.1282.5%
 
11.85262.4%
 
15.23252.3%
 
14.1252.3%
 
12.55232.1%
 
15.8211.9%
 
9.34161.4%
 
13.62141.3%
 
7.39131.2%
 
18.23100.9%
 
13.3790.8%
 
8.2790.8%
 
16.6890.8%
 
12.6690.8%
 
8.7290.8%
 
7.4280.7%
 
12.6180.7%
 
15.2780.7%
 
15.6570.6%
 
15.9270.6%
 
12.7270.6%
 
10.9670.6%
 
7.4160.5%
 
15.7960.5%
 
Other values (443)76068.7%
 
ValueCountFrequency (%) 
0.0330.3%
 
0.4810.1%
 
7.39131.2%
 
7.420.2%
 
7.4160.5%
 
7.4280.7%
 
7.4320.2%
 
7.4530.3%
 
7.4820.2%
 
7.4920.2%
 
ValueCountFrequency (%) 
171.8810.1%
 
121.2220.2%
 
106.1110.1%
 
104.7740.4%
 
100.7510.1%
 
86.0620.2%
 
83.2510.1%
 
80.4310.1%
 
79.6310.1%
 
77.9810.1%
 

review_score
Real number (ℝ≥0)

Distinct5
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.7079566
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q35
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.762523382
Coefficient of variation (CV)0.6508684008
Kurtosis-1.724741366
Mean2.7079566
Median Absolute Deviation (MAD)1
Skewness0.2593073554
Sum2995
Variance3.106488671
MonotocityNot monotonic
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
151346.4%
 
532229.1%
 
412311.1%
 
3847.6%
 
2645.8%
 
ValueCountFrequency (%) 
151346.4%
 
2645.8%
 
3847.6%
 
412311.1%
 
532229.1%
 
ValueCountFrequency (%) 
532229.1%
 
412311.1%
 
3847.6%
 
2645.8%
 
151346.4%
 

product_photos_qty
Real number (ℝ≥0)

Distinct9
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.113019892
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q33
95-th percentile5
Maximum9
Range8
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.54711014
Coefficient of variation (CV)0.7321796385
Kurtosis1.627156397
Mean2.113019892
Median Absolute Deviation (MAD)0
Skewness1.439740448
Sum2337
Variance2.393549786
MonotocityNot monotonic
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%) 
159653.9%
 
218616.8%
 
414713.3%
 
3898.0%
 
5403.6%
 
6322.9%
 
780.7%
 
860.5%
 
920.2%
 
ValueCountFrequency (%) 
159653.9%
 
218616.8%
 
3898.0%
 
414713.3%
 
5403.6%
 
6322.9%
 
780.7%
 
860.5%
 
920.2%
 
ValueCountFrequency (%) 
920.2%
 
860.5%
 
780.7%
 
6322.9%
 
5403.6%
 
414713.3%
 
3898.0%
 
218616.8%
 
159653.9%
 

product_weight_g
Real number (ℝ≥0)

Distinct190
Distinct (%)17.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1176.20434
Minimum50
Maximum30000
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum50
5-th percentile100
Q1180
median250
Q3700
95-th percentile5937.5
Maximum30000
Range29950
Interquartile range (IQR)520

Descriptive statistics

Standard deviation2836.023224
Coefficient of variation (CV)2.411165413
Kurtosis30.17575623
Mean1176.20434
Median Absolute Deviation (MAD)100
Skewness4.878086211
Sum1300882
Variance8043027.725
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
25021319.3%
 
20011210.1%
 
150787.1%
 
167373.3%
 
180322.9%
 
173322.9%
 
500302.7%
 
175302.7%
 
400282.5%
 
900282.5%
 
100262.4%
 
50242.2%
 
300191.7%
 
600181.6%
 
1000171.5%
 
1025141.3%
 
156141.3%
 
700100.9%
 
550100.9%
 
159090.8%
 
80090.8%
 
18390.8%
 
45080.7%
 
18870.6%
 
130070.6%
 
Other values (165)28525.8%
 
ValueCountFrequency (%) 
50242.2%
 
6010.1%
 
6530.3%
 
7510.1%
 
8520.2%
 
9030.3%
 
100262.4%
 
11010.1%
 
11510.1%
 
12510.1%
 
ValueCountFrequency (%) 
3000010.1%
 
2850010.1%
 
2280010.1%
 
2085010.1%
 
1855010.1%
 
1651510.1%
 
1600020.2%
 
1590010.1%
 
1555020.2%
 
1481310.1%
 

product_length_cm
Real number (ℝ≥0)

Distinct54
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24.00904159
Minimum14
Maximum105
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum14
5-th percentile16
Q116
median18
Q328
95-th percentile50
Maximum105
Range91
Interquartile range (IQR)12

Descriptive statistics

Standard deviation13.32258687
Coefficient of variation (CV)0.5548987376
Kurtosis9.474699973
Mean24.00904159
Median Absolute Deviation (MAD)2
Skewness2.708880279
Sum26554
Variance177.4913209
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1636833.3%
 
1714613.2%
 
18898.0%
 
20837.5%
 
30484.3%
 
40454.1%
 
19282.5%
 
21272.4%
 
22222.0%
 
38151.4%
 
23151.4%
 
25151.4%
 
35151.4%
 
24131.2%
 
36121.1%
 
50111.0%
 
31100.9%
 
2890.8%
 
6090.8%
 
4590.8%
 
3390.8%
 
2680.7%
 
2980.7%
 
3970.6%
 
3770.6%
 
Other values (29)787.1%
 
ValueCountFrequency (%) 
1460.5%
 
1510.1%
 
1636833.3%
 
1714613.2%
 
18898.0%
 
19282.5%
 
20837.5%
 
21272.4%
 
22222.0%
 
23151.4%
 
ValueCountFrequency (%) 
10540.4%
 
10030.3%
 
8220.2%
 
8020.2%
 
7810.1%
 
7720.2%
 
7510.1%
 
7010.1%
 
6910.1%
 
6840.4%
 

product_height_cm
Real number (ℝ≥0)

Distinct55
Distinct (%)5.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.43851718
Minimum2
Maximum105
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum2
5-th percentile2
Q15
median11
Q316
95-th percentile35
Maximum105
Range103
Interquartile range (IQR)11

Descriptive statistics

Standard deviation12.58413564
Coefficient of variation (CV)0.9364229307
Kurtosis14.72935289
Mean13.43851718
Median Absolute Deviation (MAD)6
Skewness3.026996798
Sum14863
Variance158.3604698
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
222420.3%
 
111039.3%
 
131009.0%
 
10827.4%
 
20645.8%
 
12555.0%
 
16494.4%
 
18373.3%
 
8363.3%
 
6312.8%
 
5282.5%
 
14252.3%
 
9232.1%
 
15232.1%
 
4222.0%
 
7191.7%
 
17181.6%
 
25161.4%
 
30151.4%
 
35121.1%
 
19121.1%
 
24111.0%
 
40111.0%
 
3100.9%
 
4760.5%
 
Other values (30)746.7%
 
ValueCountFrequency (%) 
222420.3%
 
3100.9%
 
4222.0%
 
5282.5%
 
6312.8%
 
7191.7%
 
8363.3%
 
9232.1%
 
10827.4%
 
111039.3%
 
ValueCountFrequency (%) 
10540.4%
 
9610.1%
 
8710.1%
 
8010.1%
 
7710.1%
 
6620.2%
 
6510.1%
 
6310.1%
 
6110.1%
 
6030.3%
 

product_width_cm
Real number (ℝ≥0)

Distinct53
Distinct (%)4.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19.43490054
Minimum11
Maximum105
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum11
5-th percentile11
Q113
median16
Q320
95-th percentile40
Maximum105
Range94
Interquartile range (IQR)7

Descriptive statistics

Standard deviation10.70145306
Coefficient of variation (CV)0.5506307087
Kurtosis11.18706585
Mean19.43490054
Median Absolute Deviation (MAD)4
Skewness2.741353351
Sum21495
Variance114.5210976
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
2017615.9%
 
1117415.7%
 
1512911.7%
 
13938.4%
 
12857.7%
 
14696.2%
 
16544.9%
 
40433.9%
 
18393.5%
 
25302.7%
 
17292.6%
 
30282.5%
 
23181.6%
 
35141.3%
 
32121.1%
 
24111.0%
 
3180.7%
 
2270.6%
 
1970.6%
 
3360.5%
 
4560.5%
 
2160.5%
 
3660.5%
 
2650.5%
 
2940.4%
 
Other values (28)474.2%
 
ValueCountFrequency (%) 
1117415.7%
 
12857.7%
 
13938.4%
 
14696.2%
 
1512911.7%
 
16544.9%
 
17292.6%
 
18393.5%
 
1970.6%
 
2017615.9%
 
ValueCountFrequency (%) 
10510.1%
 
10010.1%
 
8210.1%
 
7910.1%
 
7310.1%
 
6810.1%
 
6720.2%
 
6320.2%
 
6220.2%
 
6110.1%
 

payment_value
Real number (ℝ≥0)

Distinct681
Distinct (%)61.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean180.7053888
Minimum0.33
Maximum4809.44
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum0.33
5-th percentile29.31
Q168.25
median88.91
Q3162.55
95-th percentile536.0075
Maximum4809.44
Range4809.11
Interquartile range (IQR)94.3

Descriptive statistics

Standard deviation354.4144625
Coefficient of variation (CV)1.961283307
Kurtosis64.50049999
Mean180.7053888
Median Absolute Deviation (MAD)37.825
Skewness6.866266749
Sum199860.16
Variance125609.6112
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
165.8201.8%
 
162.55181.6%
 
85.62141.3%
 
86.9100.9%
 
87.3280.7%
 
69.1480.7%
 
93.6570.6%
 
81.5160.5%
 
1949.5260.5%
 
140.1660.5%
 
133.9860.5%
 
90.9960.5%
 
96.7260.5%
 
85.4160.5%
 
64.1760.5%
 
62.7860.5%
 
9.3960.5%
 
25860.5%
 
37.7560.5%
 
355.150.5%
 
45.0950.5%
 
80.250.5%
 
89.2850.5%
 
975.0550.5%
 
120.3550.5%
 
Other values (656)91983.1%
 
ValueCountFrequency (%) 
0.3310.1%
 
3.7710.1%
 
6.4210.1%
 
6.4810.1%
 
7.110.1%
 
7.3210.1%
 
8.2610.1%
 
8.7910.1%
 
9.3960.5%
 
9.4610.1%
 
ValueCountFrequency (%) 
4809.4420.2%
 
2442.8210.1%
 
2419.210.1%
 
2404.7220.2%
 
2026.5410.1%
 
1950.250.5%
 
1949.5260.5%
 
1841.7510.1%
 
1818.2310.1%
 
1488.1410.1%
 

category_count
Real number (ℝ≥0)

Distinct46
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6633.004521
Minimum39
Maximum11990
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum39
5-th percentile708.5
Q14726
median7380
Q38833
95-th percentile10030
Maximum11990
Range11951
Interquartile range (IQR)4107

Descriptive statistics

Standard deviation2750.653738
Coefficient of variation (CV)0.414691974
Kurtosis-0.2296188104
Mean6633.004521
Median Absolute Deviation (MAD)1625
Skewness-0.6153387955
Sum7336103
Variance7566095.986
MonotocityNot monotonic
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%) 
621322120.0%
 
815120718.7%
 
1003014312.9%
 
9005787.1%
 
7380645.8%
 
4726575.2%
 
8833514.6%
 
4281343.1%
 
4400322.9%
 
3589222.0%
 
3204201.8%
 
4590201.8%
 
11990181.6%
 
3999171.5%
 
2625141.3%
 
2847111.0%
 
719111.0%
 
1192100.9%
 
70580.7%
 
56570.6%
 
217060.5%
 
203050.5%
 
19950.5%
 
24740.4%
 
116340.4%
 
Other values (21)373.3%
 
ValueCountFrequency (%) 
3910.1%
 
7130.3%
 
14510.1%
 
15510.1%
 
19950.5%
 
21910.1%
 
24740.4%
 
27130.3%
 
27240.4%
 
27820.2%
 
ValueCountFrequency (%) 
11990181.6%
 
1003014312.9%
 
9005787.1%
 
8833514.6%
 
815120718.7%
 
7380645.8%
 
621322120.0%
 
4726575.2%
 
4590201.8%
 
4400322.9%
 

Days_to_deliver
Real number (ℝ≥0)

Distinct961
Distinct (%)86.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.8666836
Minimum2.22818287
Maximum144.8952431
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum2.22818287
5-th percentile10.48011285
Q117.64739005
median22.4016088
Q326.54847222
95-th percentile35.027636
Maximum144.8952431
Range142.6670602
Interquartile range (IQR)8.901082176

Descriptive statistics

Standard deviation9.546703121
Coefficient of variation (CV)0.4174939965
Kurtosis32.01416219
Mean22.8666836
Median Absolute Deviation (MAD)4.338993056
Skewness3.417160684
Sum25290.55206
Variance91.13954049
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
19.4609143590.8%
 
22.401608870.6%
 
28.5224074160.5%
 
20.5893402860.5%
 
18.6686689860.5%
 
13.0843518560.5%
 
24.4528935250.5%
 
15.0165046350.5%
 
18.3880439850.5%
 
15.2812731550.5%
 
15.4181481550.5%
 
24.4139930650.5%
 
32.0563657450.5%
 
22.2024768550.5%
 
13.6027893540.4%
 
23.5635416740.4%
 
21.2596759340.4%
 
18.2192361130.3%
 
34.3257291730.3%
 
41.214641230.3%
 
26.3219444430.3%
 
23.0868634330.3%
 
26.2661805620.2%
 
26.101296320.2%
 
23.2139814820.2%
 
Other values (936)99389.8%
 
ValueCountFrequency (%) 
2.2281828710.1%
 
2.25190972210.1%
 
2.54951388910.1%
 
3.36413194410.1%
 
4.05612268510.1%
 
4.30564814810.1%
 
4.70914351910.1%
 
4.96118055610.1%
 
5.37126157420.2%
 
5.5296296310.1%
 
ValueCountFrequency (%) 
144.895243110.1%
 
106.99233810.1%
 
78.0486458310.1%
 
70.3810763910.1%
 
68.4414930610.1%
 
63.5254513920.2%
 
60.6093055610.1%
 
60.5214351910.1%
 
59.3472685210.1%
 
57.4698958310.1%
 

Month
Real number (ℝ≥0)

Distinct12
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.910488246
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum1
5-th percentile2
Q13
median6
Q38
95-th percentile11.75
Maximum12
Range11
Interquartile range (IQR)5

Descriptive statistics

Standard deviation2.992016275
Coefficient of variation (CV)0.5062215084
Kurtosis-0.8753779789
Mean5.910488246
Median Absolute Deviation (MAD)3
Skewness0.3670529223
Sum6537
Variance8.95216139
MonotocityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%) 
324422.1%
 
816915.3%
 
511710.6%
 
71019.1%
 
6898.0%
 
2787.1%
 
4746.7%
 
12565.1%
 
11555.0%
 
9534.8%
 
10393.5%
 
1312.8%
 
ValueCountFrequency (%) 
1312.8%
 
2787.1%
 
324422.1%
 
4746.7%
 
511710.6%
 
6898.0%
 
71019.1%
 
816915.3%
 
9534.8%
 
10393.5%
 
ValueCountFrequency (%) 
12565.1%
 
11555.0%
 
10393.5%
 
9534.8%
 
816915.3%
 
71019.1%
 
6898.0%
 
511710.6%
 
4746.7%
 
324422.1%
 

TargetVar
Boolean

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.8 KiB
1
553 
0
553 
ValueCountFrequency (%) 
155350.0%
 
055350.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

freight_valuereview_scoreproduct_photos_qtyproduct_weight_gproduct_length_cmproduct_height_cmproduct_width_cmpayment_valuecategory_countDays_to_deliverMonthTargetVar
021.1051.01383.050.010.040.0108.001199023.38964130
115.3811.0200.016.012.011.055.28738013.39643580
29.9811.0173.018.013.012.090.9881519.41905130
315.3811.0173.018.013.012.0105.17815127.40538220
48.3311.0321.019.014.013.093.22815123.03994270
58.3311.0321.019.014.013.0186.44815122.51131970
68.3311.0321.019.014.013.0186.44815122.51131970
717.6012.01750.037.022.040.059.24459031.61843730
817.6012.01750.037.022.040.08.26459031.61843730
940.3711.06550.020.020.020.0189.37815130.37537020

Last rows

freight_valuereview_scoreproduct_photos_qtyproduct_weight_gproduct_length_cmproduct_height_cmproduct_width_cmpayment_valuecategory_countDays_to_deliverMonthTargetVar
109614.1051.0150.016.016.011.040.00900521.444734101
109712.6541.0180.017.010.013.087.55815126.30626231
10987.4554.0300.016.05.012.034.3573809.14182971
109918.4324.0250.016.02.020.096.43621328.15048651
11007.7852.0250.016.02.020.062.78621319.43121511
110114.3044.0250.016.02.020.092.30621328.389225111
110222.1114.0188.017.06.012.048.08472620.56775581
110314.1051.0150.016.016.011.040.00900522.356146101
110418.2351.0150.016.017.022.032.43900523.61427141
110515.4314.0250.016.02.020.093.43621333.23096171

Duplicate rows

Most frequent

freight_valuereview_scoreproduct_photos_qtyproduct_weight_gproduct_length_cmproduct_height_cmproduct_width_cmpayment_valuecategory_countDays_to_deliverMonthTargetVarcount
129.3411.0900.040.05.040.0133.98883318.668669306
2313.3721.0900.040.08.040.0140.16883328.522407116
3215.2352.0175.019.011.014.09.39472622.401609516
3615.9211.0315.014.013.013.01949.52459020.589340506
17.3911.050.020.05.015.086.90320415.418148805
57.7811.050.020.020.018.0211.40815115.281273605
1510.9626.0400.016.012.012.0144.30738032.056366505
1711.8512.0100.028.017.011.086.90738022.202477905
2414.0812.0900.040.08.040.0120.35883318.3880441215
2715.0111.0400.035.035.025.0975.05815124.4139931205